Reduce initial pipeline load time by 4-5x (1/3) #149
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
from_pretrained(low_cpu_mem_usage=True)
(akin to the transformers implementation, but greatly simplified) toOmniGen
andOmniGenPipeline
init_empty_weights
context manager when initializing the model. This avoids slow CPU weight initialization, particularly duringself.initialize_weights()
.These weights are immediately overwritten when the
state_dict
is loaded. This means we can safely bypass initialization without consequence.Additionally, this can achieved with no additional libraries beyond those in
requirements.txt
. As such, I set the default aslow_cpu_mem_usage=True
.Results
From my tests, this change:
Cold Load
Hot Load
This is the first of 3 PRs I'm issuing to improve performance/fix errors. I've tried to keep each incremental change as small in scope as possible. PRs: 1. This, 2. #150, 3. #151